Asynchronism of the spreading dynamics underlying the bursty pattern
Wang Tong1, 3, Zhou Ming-Yang2, †, Fu Zhong-Qian1
Department of Electronic Science and Technology, University of Science and Technology of China, Hefei 230027, China
Guangdong Province Key Laboratory of Popular High Performance Computers, College of Computer Science and Software Engineering, Shenzhen University, Shenzhen 518060, China
Department of Physics, University of Fribourg, Chemin du Musée 3, Fribourg, CH-1700, Switzerland

 

† Corresponding author. E-mail: zhoumy2010@gmail.com

Project supported by the National Natural Science Foundation of China (Grant Nos. 61703281, 11547040, 61803266, 61503140, and 61873171), the PhD Start- Up Fund of Natural Science Foundation of Guangdong Province, China (Grant Nos. 2017A030310374 and 2016A030313036), the Science and Technology Innovation Commission of Shenzhen, China (Grant No. JCYJ20180305124628810), and the China Scholarship Council (Grant No. 201806340213).

Abstract

The potential mechanisms of the spreading phenomena uncover the organizations and functions of various systems. However, due to the lack of valid data, most of early works are limited to the simulated process on model networks. In this paper, we track and analyze the propagation paths of real spreading events on two social networks: Twitter and Brightkite. The empirical analysis reveals that the spreading probability and the spreading velocity present the explosive growth within a short period, where the spreading probability measures the transferring likelihood between two neighboring nodes, and the spreading velocity is the growth rate of the information in the whole network. Besides, we observe the asynchronism between the spreading probability and the spreading velocity. To explain the interesting and abnormal issue, we introduce the time-varying spreading probability into the susceptible-infected (SI) and linear threshold (LT) models. Both the analytic and experimental results reproduce the spreading phenomenon in real networks, which deepens our understandings of spreading problems.

1. Introduction

In real systems, a mass of biological substance and virtual information spread among different social groups, such as causative agents,[1,2] plausible rumors,[3,4] and marketing messages.[5,6] Complex network is a simple but effective approach to the studies of ubiquitous propagation phenomena,[7,8] and has witnessed fruitful achievements in computer science, epidemiology, and other areas. For instance, the massive data collected from the online systems have revealed the underlying spreading mechanisms on the global scale,[9] helping manage the information diffusion[10,11] and better understand the spreading influence of nodes.[1214] For some infrastructure and financial systems, studying the intricate interaction patterns between nodes conduces to the maintenance of perfect functions and resisting the risks.[1517] Besides, another striking example is the critical threshold of the spreading probability below which the spreading process will spontaneously die away, otherwise it endangers the whole population.[18] It enlightens better immunization strategies by improving the critical value.[19,20] Therefore, the spreading issue provides a convenient access to understand the interplay between the network structures and functions.

Early works on the simple epidemic models solved the critical conditions for different regimes[2124] and proved the non-trivial effects of the network structure and individual behaviors on spreading dynamics.[2527] More recently, empirical researches developed classical models to describe the real spreading process. Min et al. combined the heterogeneous human activity patterns with the susceptible-infected (SI) model, resulting in the power-law decreasing spreading velocity in the long-time limit.[28] Gernat et al. extended the deterministic SI model on temporal networks, where the evolving contacts were comparable to the spreading dynamics. Compared with the randomized reference networks, they concluded that the inherent burstiness of the social interactions and the accelerated spreading process could co-occur in the bee communication network.[29] Wu et al. proposed a susceptible-accepted-recovered (SAR) model that considers information sensitivity and social reinforcement. The model reproduced main features of the real spreading events on Sina Weibo, the extremely fast or slow spreading process, by theoretical analysis and numerical simulations.[30] Vicario et al. found that the diffusion of conspiracy and scientific information among Facebook users shared the similar consumption patterns but showed different cascade dynamics.[31] To mimic the rumor spreading, they proposed a data-driven percolation model that demonstrates the decisive roles of homogeneity and polarization in predicting the cascade size. Mei et al. proposed a heterogeneous multi-stage model to investigate the impacts of the social reinforcement on spreading of information,[32] which hinders the diffusion process but facilitates the emergence of some hot spots. Besides, they maximized the spreading process with limited control sources by using Pontryagin’s maximum principle. Although the above-mentioned works have shed light on the real-world spreading mechanisms, there is still a lack of the knowledge of the evolution of the infection intensity and reaction rate of real spreading processes.

In this paper, we explore the information diffusion of real events on networks Twitter and Brightkite, and find that the spreading probability changes over time, which violates the constant transferring rate of standard spreading models. Beyond that, the empirical process shows the asynchronism that the maximal values of spreading probability and spreading velocity do not always occur simultaneously. To explain the phenomenon, we propose the improved SI and LT models based on the time-varying spreading probability. The analytic and experimental results show the time difference between the spreading probability and the spreading velocity, demonstrating the consistency of the empirical analysis and the proposed methods.

2. Materials and methods
2.1. Dataset

We use two datasets to study the empirical spreading process. Higgs recorded the activities of users on Twitter spanning from 1st to July 7th 2012,[33] before and after the official announcement of the Higgs boson on July 4th. Table 1 lists an example of the dataset, where RT and RE mean the retweeting and replying actions. It specifies the spreading paths on the network, which is composed of the friend/follower relationships between 456626 users all over the world. Therefore, we can discuss the paths of the spreading process. In addition, Brightkite is a location-based social network where users can share their locations to friends.[34] In Brightkite, each location denoted by the longitude and latitude is treated as an independent spreading event. We investigate the different spreading events and concentrate on the check-in information of site (40.440625, –79.995886) from June 19th, 2008 to June 13th, 2009. Table 2 lists the statistical properties of the friendship network of Twitter and Brightkite.

Table 1.

An example of user activities on Twitter. RT represents retweet and RE means reply.

.
Table 2.

The structural properties of the friendship networks, including the network size N, number of edges |E|, degree heterogeneity H = 〈 k2 〉/〈 k2, average clustering coefficient 〈 C 〉, and network diameter D.

.
2.2. The proposed time-varying SI model

We use the SI model to analyze the empirical process on the social networks, where each node has two states: the infected (I) state and the susceptible (S) state. Initially, all nodes are in S state except a few particular infected ones. On Twitter, when a user receives the information, the user either accepts it or rejects (ignores) it. If the user retweets the information, we consider that it will be activated and become into the infected state. Similarly, the users on Brightkite can check in a place and share the location. If a user visits the same place according to his friends’ check-in information, the user will turn into the infected state. The spreading processes on Twitter and Brightkite are governed by[35]

where xi(t) is the infection probability of node i, the entry aji of the adjacency matrix A represents the connections between nodes i and j, and β is the spreading rate and is treated as constant in the classical model.

In the empirical analysis, it is difficult to calculate the transferring rate since each user may receive the information from multiple spreaders with introduces nonlinear phenomena. To simplify this issue, we evaluate the transferring rate as

where the denominator R is the amount of the susceptible users that connect to the infected ones, and n is the number of newly infected nodes per unit time. The spreading velocity is evaluated by the fraction of the newly infected nodes. Figure 1 specifies the information flows and the calculation of the spreading probability, where the infected nodes are marked red. At time t1, the initial spreader node 1 infects two (nodes 2 and 3) of four neighbors (out-degree). The spreading probability is calculated as 2/4 = 0.5. At the next time step t2, there are five susceptible nodes connecting to the infected nodes 1, 2, and 3, but only two of them (nodes 4 and 5) get infected. Accordingly, the spreading probability is 2/5 = 0.4.

Fig. 1 An illustration of the information contagion and the calculation of the spreading probability. The infected nodes are marked red and the arrow represents the direction of the information flows.

By tracking the propagation paths, figures 2(a) and 2(c) plot the evolution of the information about the new particle Higgs boson on Twitter, where the time resolution of the process is set as four hours. Initially, there were only 121 users posting the information about the Higgs boson. The spreading probability achieved the largest value on July 4th at 8:00 because of the authoritative declaration of the Higgs boson,[36,37] followed by the maximum of the spreading velocity at the next moment 12:00. Finally, the process ended up with more than 53 % participants of the total users. We denote the peak time of the spreading probability and the spreading velocity as tβ and tv. For the spreading pattern on Twitter, Δ t = tvtβ > 0 indicates that the maximal spreading probability occurs earlier than the maximal spreading velocity. Whereas on Brightkite, we observe that the maximal spreading probability occurs later [Figs. 2(b) and 2(d)], namely, Δt < 0. Unfortunately, the classical SI model in Eq. (1) cannot explain the phenomena. Besides, the empirical analysis reveals the bursty pattern of the spreading probability and the spreading velocity, which also violates Eq. (1) of the SI model. To explain the phenomena, we propose the advanced SI model based on the time-varying spreading probability β(t), where the node state is described by

The spreading velocity is equal to the expectation of being infected for each node i, i.e., , where i(t) represents the density of the infected nodes. Equation (3) is nonlinear and cannot be calculated explicitly.

Fig. 2 The spreading probability and the spreading velocity of the empirical processes. Δt indicates the time gap between the maximal spreading probability and the maximal spreading velocity. Panels (a) and (c) show the spreading dynamics on Twitter. Panels (b) and (d) plot the spreading dynamics of the location information on Brightkite.

Initially, most nodes are in S state, xi(t)→ 0. Neglecting the higher-order xi(t)xj(t), we evaluate the early stage of the spreading dynamics in Eq. (3) by

followed by the matrix formalism

where x(t) = [x1(t),x2(t),…,xN(t)]T is the state vector of all nodes. Then, the infection probability is obtained as

where Cr is a constant, λr is the rth eigenvalue, and vr denotes the corresponding eigenvector. If the largest eigenvalue λ1 is much larger than the others, the infection probability of each node roughly follows

The change rate in the infected probability of each node is

Finally, the peak time of the spreading probability and the spreading velocity can be obtained by

Based on Eq. (8), when dβ(t)/dt = 0, the change rate in the spreading velocity d2i(t)/dt2 is larger than 0. The results indicate the asynchronism of the spreading dynamics.

2.3. The linear threshold model

Apart from the SI model, the linear threshold model has also been widely used to imitate the information spreading dynamics on social networks. Here, we use the LT model to show that the results of this paper are not limited to the SI model and also hold on other spreading models. The SI model emphasizes the contact transmission in the epidemic process, while the LT model focuses on the accumulative effects of interactions and the collective behaviors of social groups.[38] In the LT model, every node is in one of the two states (S and I). Except for a few initial spreaders, the others are susceptible and endowed with the threshold values from the given distribution. At each time step, the susceptible nodes will turn into the infected state if the fraction of infected neighbors exceeds the threshold value. In some researches, the term ‘threshold’ sometimes refers to the critical condition of spreading processes. To avoid ambiguity, we use the personalized resistance to characterize the intrinsic threshold of the nodes.

The resistance has a significant influence on the spreading dynamics. We calculate the resistance hi of the user i who participates in the spreading of Higgs boson on Twitter[39]

where Ni is the number of the infected neighbors when node i forwards the information, and ki is the degree of node i. Figure 3(a) shows the distribution of resistance with the bin interval 0.05, and figure 3(b) plots the average in-degree versus the resistance. The fitted line suggests that the resistance is negatively correlated with the in-degree of the nodes. According to the spreading process on the scale-free network, the peripheral small-degree nodes firstly infect the large-degree ones, and then the large-degree nodes transmit the information to the whole network across the small classes.[14,40] Obviously, the larger the degree nodes, the lower the resistance, and they greatly accelerate the spreading process on Twitter.

Fig. 3 The resistance distribution and the average in-degree versus the resistance. Panel (a) shows the histogram of the resistance distribution with the bin width 0.05. Panel (b) shows the average in-degree of nodes in the corresponding resistance intervals. The fitted curve reflects the negative correlation between the average in-degree and resistance of nodes.

With the assumption that the negative correlation leads to the asynchronism of the spreading process, we implement the LT model by the following steps:

Firstly generate the random sequence ϕ = [ϕ1,ϕ2,…,ϕN] following the given distribution. The smaller resistance is assigned to the larger degree node with the probability pi = ki/Σjkj.

Select a proportion p of the nodes as the initial spreaders.

At each step, the susceptible node observes the state of its neighbors and turns in to the infected state when the fraction of infected neighbors Ni/ki exceeds the resistance.

Repeat step 3 until the spreading process stops.

3. Results

For the advanced SI model, we use the Gaussian-like distribution to approximate the bursty pattern and the dynamical spreading probability in Fig. 2,

where μ and δ, respectively, represent the peak time and the standard deviation of the spreading probability.

We first simulate the spreading process on the networks based on Eq. (3). Figure 4(a) plots the change rate in the spreading velocity for different δ and μ on Twitter, where a = 0.1 and p = 0.1 %, which suggests that both parameters δ and μ influence the asynchronism little. Compared with the peak time of the spreading probability at t = μ, all the curves present the time lag of the maximal spreading velocity (d2i(t)/dt2 = 0). Without loss of generality, we set μ = 4 and δ = 2. Besides, figure 4(b) shows the influence of a and p on the time gap δt between the maximum of the spreading probability and the spreading velocity. When the parameters are relatively small, the spreading velocity is postponed (δ t > 0). With the increase of a and p, the time gap gradually decreases to –1, meaning that the spreading velocity precedes the spreading probability. In general, the numerical solutions validate the analytic results of the advanced SI model and the asynchronism on the real-world networks. Note that the selection of other parameters is closely related to the characteristics of the empirical results. For example, the process on Twitter started with a small number of spreaders and showed the lower spreading probability. Therefore, we consider the relatively small a and initial infected density p for the simulated process, while the parameters for that on Brightkite are higher. We argue that the results are not limited to the parameter settings. Apart from the Gaussian-like distribution, other β(t) that shows obvious peak is also applicable for the asynchronism of the spreading process.

Fig. 4 Numerical simulations of the advanced SI model. The spreading probability follows different Gaussian-like distributions. Panel (a) plots the change rate in the spreading velocity on Twitter as functions of δ and μ, where the initial infected density p = 0.1 % and a = 0.1. Panel (b) plots the time gap δ t for different a and p on Brightkite.

Furthermore, we simulate the advanced SI model under different conditions. Figures 5(a) and 5(b) plot the spreading probability of the simulated processes on Twitter and Brightkite. Due to the multiple contacts with the infected nodes, the simulated probabilities are much larger but present similar trend to the given β(t), which have the peak values at t = μ = 4 (the dashed line). On Twitter, figure 5(c) plots the maximal spreading velocity at t = 5, where a = 0.1 and p = 0.1 %. For Brightkite, we consider the higher spreading probability (av = 0.3) and different amount of initial spreaders p = 0.1 % and 2.5 %. Figure 5(d) shows that bigger p enhances the spreading process in the early steps and puts ahead the peak time of the spreading velocity. The time gap Δ t = –1 and Δ t = 1 clearly illustrate two types of time difference between the spreading probability and the spreading velocity.

Fig. 5 The spreading dynamics of the advanced SI model. Panels (a) and (b) show the spreading probability on Twitter and Brightkite. Panels (c) and (d) plot the spreading velocity of the corresponding spreading process.

To verify the impacts of the resistance on the spreading process, figure 6 plots the spreading dynamics by repeatedly running the LT model. In Fig. 6(a), with very few initial spreaders (p = 0.05 %), the spreading process on Twitter gradually dies down when the resistance of each user is randomly drawn from the uniform distribution U(0,1). The spreading velocity is close to 0. In contrast, figure 6(b) shows the outbreak pattern where the spreading velocity reaches the peak value later. Figures 6(c) and 6(d) plot the results of the same experiments on Brightkite. The difference is the much higher p = 1 %, which should be responsible for the time gap Δ t = –1 and the empirical behaviors on Brightkite. Notice that, this case really but rarely exists, because driving a larger number of people to initiate the spreading process is always costly in real scenarios. Besides, another condition is that the resistance is positively correlated with the in-degree of the nodes. We can imagine that the spreading process will stop in a very short time, because the large-degree nodes who always play important roles in spreading processes are hardly to get infected and send out the information or virus.

Fig. 6 The spreading dynamics obtained by the process of the LT model. The initial density of spreaders is p = 0.05 % on Twitter and p = 1 % on Brightkite. Panel (a) shows the decaying spreading process on Twitter when the resistance of each node is random from the uniform distribution. Panel (b) plots the bursty pattern and the asynchronism of the spreading process when the resistance is negatively correlated with the in-degree of node. Panels (c) and (d) show the counterparts on Brightkite.
4. Conclusion

Our work mainly explores the empirical spreading process on Twitter and Brightkite based on the propagation paths. The empirical analysis reveals the bursty spreading pattern, where we observe the asynchronism underlying the spreading process. However, the classical models cannot capture the time difference between the peak values of spreading probability and spreading velocity. By considering the time-varying spreading probability, we propose the improved SI and LT models to investigate the abnormal phenomenon, which provide analytical explanations and replicate the time difference that commonly exists in the real spreading process.

Understanding the spreading phenomena is of great significance to the diffusion efficiency. The asynchronism helps us determine better time to regulate the spreading process, making the promotions or updated innovations reach more people. Besides, the spreading paths suggest the temporal behaviors of users, especially during the outbreak period, which provide insights into the influence of users instead of purely structural information. Hence, our work not only enriches the knowledge of the complicated spreading mechanisms but also has widely potential applications to dissemination management and leadership identification.

Reference
[1] Yang Q Shi W Zhang L Xu Y Xu J Li S Zhang J Hu K Ma C Zhao X Li X Liu F Tong X Zhang G Yu P Pybus O G Tian H 2018 Emerg. Infect. Dis. 24 1095
[2] Lam T T Y Zhou B Wang J et al. 2015 Nature 522 102
[3] Mustafaraj E Metaxas P T 2017 Proceedings of the 2017 ACM on Web Science Conference 2017 New York, USA 235 10.1145/3091478.3091523
[4] Shin J Jian L Driscoll K Bar F 2017 New Media & Society 19 1214
[5] Karsai M Iñiguez G Kikas R Kaski K Kertész J 2016 Sci. Rep. 6 27178
[6] Leskovec J Adamic L A Huberman B A 2007 ACM Trans. Web 1 5
[7] Wang W Liu Q H Liang J Hu Y Zhou T 2019 Phys. Rep. 820 1
[8] Zhang Z K Liu C Zhan X X Lu X Zhang C X Zhang Y C 2016 Phys. Rep. 651 1
[9] Vosoughi S Roy D Aral S 2018 Science 359 1146
[10] He Z Cai Z Yu J Wang X Sun Y Li Y 2017 IEEE T. Veh. Technol. 66 2789
[11] Budak C Agrawal D Abbadi A E 2011 Proceedings of the 20th international conference on World wide web 2011 Hyderabad, India 665 10.1145/1963405.1963499
[12] Ruan Y R Lao S Y Wang J D Bai L Hou L L 2017 Acta Phys. Sin. 66 208901 in Chinese
[13] Lv L S Zhang K Zhang T Ma M Y 2019 Chin. Phys. 28 020501
[14] Kitsak M Gallos L K Havlin S Liljeros F Muchnik L Stanley H E Makse H A 2010 Nat. Phys. 6 888
[15] Buldyrev S V Parshani R Paul G Stanley H E Havlin S 2010 Nature 464 1025
[16] Helbing D 2013 Nature 497 51
[17] Wu J J Gong K Wang C Wang L 2018 Acta Phys. Sin. 67 088901 in Chinese
[18] Pastor-Satorras R Castellano C Van Mieghem P Vespignani A 2015 Rev. Mod. Phys. 87 925
[19] Zhou M Y Xiong W M Liao H Wang T Wei Z W Fu Z Q 2018 Chaos 28 051101
[20] Clusella P Grassberger P Pérez-Reche F J Politi A 2016 Phys. Rev. Lett. 117 208301
[21] Wang Y Chakrabarti D Wang C Faloutsos C 2003 Proceedings of the 22nd International Symposium on Reliable Distributed Systems 117 October, 6–8, 2003 Florence, Italy 25 10.1109/RELDIS.2003.1238052
[22] Castellano C Pastor-Satorras R 2010 Phys. Rev. Lett. 105 218701
[23] Valdano E Ferreri L Poletto C Colizza V 2015 Phys. Rev. 5 021005
[24] Boguñá M Castellano C Pastor-Satorras R 2013 Phys. Rev. Lett. 111 068701
[25] Stegehuis C van der Hofstad R van Leeuwaarden J S H 2016 Sci. Rep. 6 29748
[26] Funk S Gilad E Watkins C Jansen V A A 2009 P. Natl. Acad. Sci. 106 6872
[27] Gleeson J P O’Sullivan K P Baños R A Moreno Y 2016 Phys. Rev. 6 021019
[28] Min B Goh K I Vazquez A 2011 Phys. Rev. 83 036102
[29] Gernat T Rao V D Middendorf M Dankowicz H Goldenfeld N Robinson G E 2018 Proc. Natl. Acad. Sci. USA 115 1433
[30] Wu J Zheng M Zhang Z K Wang W Gu C Liu Z 2018 Chaos 28 033113
[31] Del Vicario M Bessi A Zollo F Petroni F Scala A Caldarelli G Stanley H E Quattrociocchi W 2016 Proc. Natl. Acad. Sci. USA 113 554
[32] Mei R J Ding L An X M Hu P 2019 Chin. Phys. 28 028701
[33] De Domenico M Lima A Mougel P Musolesi M 2013 Sci. Rep. 3 2980
[34] Cho E Myers S A Leskovec J 2013 Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining 2011 San Diego, USA 1082 10.1145/2020408.2020579
[35] Newman M E J 2010 Networks: An Introduction 1 New York Oxford university press 667 669 10.1093/acprof:oso/9780199206650.001.0001
[36] ATLAS Collaboration 2012 Phys. Lett. 716 1
[37] CMS Collaboration 2012 Phys. Lett. 716 30
[38] Granovetter M 1978 Am. J. Sociol. 83 1420
[39] Watts D J 2002 Proc. Natl. Acad. Sci. USA 99 5766
[40] Barthélemy M Barrat A Pastor-Satorras R Vespignani A 2004 Phys. Rev. Lett. 92 178701